Packages
The course is focused on Python toolkit organized by role.
For data wrangling we use pandas and polars, while pytimetk streamlines time-aware feature engineering, visualization, and preprocessing with Plotly for interactive charts.
For modeling we rely on scikit-learn for pipelines and preprocessing, and the Nixtla ecosystem for forecasting:
- statsforecast for classical/statistical models and efficient
forecasting primitives
- mlforecast for machine learning models (linear, tree-based) with
exogenous features
- neuralforecast for deep learning models
- utilsforecast and coreforecast for backtesting utilities, evaluation, and feature generators
For hosted models, Nixtla’s TimeGPT is accessed via the nixtla Python client (API key required). Agents are explored via the timecopilot package.
All required packages are managed with conda using the provided environment file:
conda env create -f src/env-setup/conda_env_setup.yml
conda activate modern_tsf
Datasets
Email Subscribers
A company decided to change the selling process of its products
converting from a completely physical store approach, to a more digital
and modern solution. Hence, it decided to open an online web store that
integrates an e-commerce platform, where its “virtual” customers can by
all the merchandise.
In order to monitor this new business solution, it adopted few
well-known data analytics tools.
Google Analytics has been set up on the web store pages to collect data related to page views, sessions and organic searches. This could potentially help the company to understand whether its website is gaining popularity.
Moreover, MailChimp is used to track all the customers that buy a product and subscribe to the web store.
Finally, marketing events like discount programs and new product launch are promoted through several social network channels.
All these data are stored into the company database and can be used to analyze the factors that impacts on the web store sales.
M4 Competition Hourly
The M4 Competition is a well-known time series forecasting competition organized by Spyros Makridakis. The competition provides a large dataset of time series from various domains, including finance, economics, and demographics. The goal of the competition is to develop accurate forecasting models for these time series.
https://www.unic.ac.cy/iff/research/forecasting/m-competitions/m4/
We will use a sample of the M4 Hourly dataset, which consists of hourly time series data. The dataset contains multiple time series, each identified by a unique ID.